[00:00.000 --> 00:07.200]  First of all, thank you so much to the organizers for having me here today live from home to talk
[00:07.200 --> 00:12.260]  to you today about some of my work with the All of Us Research Program. My name is Michelle Holko.
[00:12.260 --> 00:17.380]  I'm currently a Presidential Innovation Fellow, detailed to the NIH, to work with the program,
[00:17.380 --> 00:21.560]  specifically with digital health technologies. I'm going to talk to you about one of the projects
[00:21.560 --> 00:26.920]  I've been working on there since about December last year. So I'm going to give you first an
[00:26.920 --> 00:31.220]  introduction to the All of Us Research Program. I'm going to run pretty fast through those slides
[00:31.220 --> 00:35.540]  so that we can get to the digital health technologies portfolio, but more broadly in
[00:35.540 --> 00:40.220]  the program. And then I'm going to spend the most time talking about the Fitbit, bring your own
[00:40.220 --> 00:44.380]  device data analysis and participant demographics, which is the area that I'm going to be talking
[00:44.380 --> 00:49.740]  about most today. And then we'll close with some opportunity and considerations for digital health
[00:49.740 --> 00:55.000]  technologies in the All of Us Research Program. So just again, to give you a kind of an introduction
[00:55.000 --> 00:59.600]  to the program, in case you're not familiar with it, the All of Us Research Program grew out of
[00:59.600 --> 01:04.460]  the Precision Medicine Initiative, which was founded several years ago in the previous administration
[01:05.000 --> 01:10.620]  and has $1.5 billion of congressional dollars to get the program started and to continue
[01:10.620 --> 01:14.900]  running. And the mission really for the program is to accelerate health research
[01:14.900 --> 01:20.100]  and medical breakthroughs to promote individualized prevention, treatment, and care for
[01:20.100 --> 01:26.060]  all of us. There are three core values to the program that are really important to us.
[01:26.060 --> 01:30.600]  The first is to nurture relationships with one million or more participant partners from all
[01:30.600 --> 01:36.740]  walks of life for decades. And again, diversity is really critical to the program in that. To deliver
[01:36.740 --> 01:42.480]  the largest, richest biomedical data set ever, as well as making it easy to use to a wide range
[01:42.480 --> 01:49.880]  of researchers and to catalyze a robust ecosystem of researchers. So the All of Us Research
[01:49.880 --> 01:56.720]  Program truly is an innovative and ambitious research effort. And here are some reasons why
[01:56.720 --> 02:02.040]  really the diversity at the scale of a million people or more and focusing on, you know, capturing
[02:02.040 --> 02:08.280]  the diversity that makes America, America. Focusing on participants as partners. So not just, you know,
[02:08.280 --> 02:12.820]  taking data from them, but also giving back to them and continuing to engage with them throughout
[02:12.820 --> 02:18.100]  the length of the program. The longitudinal design of the program means that we have the ability to
[02:18.100 --> 02:24.340]  recontact participants, which we've already done in some cases around COVID. And multiple
[02:24.340 --> 02:29.520]  data types. So diversity in participants, but also diversity in data, including electronic health
[02:29.520 --> 02:36.280]  records data, survey data, baseline physical measurements, biospecimens, genomics, and
[02:36.280 --> 02:39.600]  digital health technologies, which is what I'm going to be talking about today.
[02:40.160 --> 02:45.080]  Creating a national open resource for all, which is broadly accessible to all researchers with
[02:45.080 --> 02:50.960]  open source software and tools. And building in security and privacy safeguards for all participants.
[02:51.120 --> 02:57.380]  So some current progress. We opened the doors nationally on May 6, 2018. So far, we've enrolled
[02:57.380 --> 03:03.940]  over 271,000 participants that have not only completed the first initial enrollment step,
[03:03.940 --> 03:09.000]  but have also completed what we call the full protocol. And this represents individuals from
[03:09.000 --> 03:15.900]  all 50 states. Furthermore, greater than 80% of these participants represent one or more of
[03:15.900 --> 03:20.640]  the underrepresented in biomedical research categories, which includes racial and ethnic
[03:20.640 --> 03:26.320]  minorities, you know, living in a rural region, and there are many more. I'm going to go through
[03:26.320 --> 03:31.380]  those in more detail a little bit later. And greater than 50% of the participants are racial
[03:31.380 --> 03:36.960]  and ethnic minorities. We built a significant infrastructure to support the program with over
[03:36.960 --> 03:44.060]  100 plus academic groups working with us, the VA, the FQHCs, which are the federally qualifying
[03:44.060 --> 03:49.600]  healthcare centers, technology and community partners. We have over 320 clinics enrolling
[03:49.600 --> 03:54.800]  participants, and that number is still expanding. We also have a bilingual website. So the website
[03:54.800 --> 03:59.740]  is available in English and in Spanish, as well as the participant portal app and call center.
[03:59.760 --> 04:06.320]  The biobank at the Mayo Clinic supports 24-hour shipping and processing with capacity for 335
[04:06.320 --> 04:11.080]  million plus files. And we also have an interactive mobile exhibit that travels the country. The
[04:11.080 --> 04:15.720]  picture is on the bottom right of the screen. Certainly not traveling during right now, but
[04:16.840 --> 04:21.900]  usually it is, and it's actually wonderful because it can reach, you know, more rural areas where we
[04:21.900 --> 04:30.200]  don't have partners. So here is just a snapshot of enrollment to date. So you can see enrollment
[04:30.200 --> 04:34.560]  over time in the graph on the right, and then the bubbles kind of tell you where we are right now.
[04:34.560 --> 04:39.360]  Like I said, we have 271,000 that have completed the initial steps of the program, which includes
[04:40.300 --> 04:44.340]  consenting, agreeing to share your EHR data, completing the first three surveys,
[04:44.340 --> 04:49.620]  providing physical measurements, and donating at least one bio assessment. We have many more
[04:49.620 --> 04:55.100]  than that, 353,000 who have completed some part of the program, but they haven't completed all
[04:55.100 --> 05:00.020]  of those kind of initial steps. This is what the current protocol looks like. The first step is to
[05:00.020 --> 05:05.460]  enroll, to provide your informed consent, and to authorize your EHR. It's really important that we,
[05:05.460 --> 05:10.620]  you know, ask for the authorization of EHR up front at this initial step. Right now we're
[05:10.620 --> 05:16.600]  only recruiting adults who are 18 years and older. We do have plans to expand to children, but right
[05:16.600 --> 05:22.020]  now we're only recruiting 18 years and older. It's also important to note that at this initial
[05:22.020 --> 05:26.800]  enrollment, we have in-person enrollment and we have digital enrollment capabilities. Of course,
[05:26.800 --> 05:31.940]  right now the in-person enrollments have all been paused. So, in that interim, we've still
[05:31.940 --> 05:37.900]  been able to accept participants joining the program, but only through the digital method.
[05:38.180 --> 05:42.120]  Then you can answer surveys. There are many different surveys that are already available,
[05:42.120 --> 05:47.280]  including the basics, overall health, lifestyle, health care access and utilization, family medical
[05:47.280 --> 05:54.160]  history, personal health history. And we also released a survey around the coronavirus pandemic
[05:54.160 --> 05:59.940]  called the COPE survey, which focuses not only on signs and symptoms and exposure to coronavirus,
[05:59.940 --> 06:06.380]  but also, you know, the social and economic aspects of that. The next step, the next two steps actually
[06:06.380 --> 06:10.840]  kind of usually go hand in hand. And so usually if you're coming to us through one of our health
[06:10.840 --> 06:14.660]  care provider organizations, you'll provide physical measurements, which are all listed
[06:14.660 --> 06:19.060]  here. Blood pressure, heart rate, height, weight, BMI, hip circumference, and waist circumference. At the
[06:19.060 --> 06:25.300]  same time, you'll provide a bio sample. Usually blood, that's the bio sample of choice. However,
[06:25.300 --> 06:30.200]  if the blood draw is unsuccessful, or if you're coming to us as a direct volunteer digitally,
[06:30.200 --> 06:37.260]  we can send you a kit to give a saliva sample. Very few participants give your assessments. We
[06:37.260 --> 06:41.560]  are collecting some of those. And then finally, digital health technologies, which is what I'm
[06:41.560 --> 06:47.300]  going to be talking about today. We have a bring your own device program already set up with Fitbit
[06:47.300 --> 06:52.520]  and Apple HealthKit connections. We're also developing integrated apps to track mood and
[06:52.520 --> 06:59.660]  cardiorespiratory fitness, among other things to come. So now shifting from the participant to the
[06:59.660 --> 07:04.600]  researcher, and how are we going to make these data available widely to the research community
[07:04.600 --> 07:09.780]  in a usable way? That sounds very easy, but it's not. So on the left is kind of the traditional
[07:09.780 --> 07:15.180]  model where a researcher would come to the program, apply for the data, and then they would be
[07:15.180 --> 07:20.620]  basically shipped out. It would be basically shipped out to them. But with this model,
[07:20.620 --> 07:27.340]  it certainly discourages sharing of research. It also is a very weak link in terms of security.
[07:27.340 --> 07:32.660]  You have no control over those data once they leave your facility. You also need a high
[07:32.660 --> 07:38.300]  infrastructure to be able to support that to pay for multiple copies and all of that. So what we've
[07:38.300 --> 07:43.640]  decided to do is to create a cloud eccentric approach where we bring researchers to the data.
[07:43.640 --> 07:47.680]  All of the data are hosted in the cloud, the tools are being built in the cloud,
[07:47.680 --> 07:53.080]  and then researchers are able to get access to it. And in this way, it facilitates collaboration.
[07:53.080 --> 07:57.640]  There are centralized security controls. We can make it accessible to all researchers,
[07:57.640 --> 08:02.940]  and we can even think of ways of making different types of data accessible to different types of
[08:02.940 --> 08:08.020]  researchers depending on the need. So with this cloud-centric model, you can also think of
[08:08.020 --> 08:15.340]  providing the data in tiers. We certainly have a public level of data sharing that's happening,
[08:15.340 --> 08:20.380]  publicly accessible, no login is needed, shared on the website. I showed you the blurbs of the
[08:20.380 --> 08:27.040]  enrollment over time. That's one of the snapshots. We've also created a registered tier, which
[08:27.700 --> 08:32.960]  requires identity verification, signing a data use agreement, and completing research ethics
[08:32.960 --> 08:39.880]  training. There's also the idea of potentially developing a controlled tier of data, which would
[08:40.360 --> 08:48.440]  give people even more access to the data. So at each step, where the security goes higher,
[08:48.440 --> 08:54.600]  and we are instituting more and more controls, because you're getting more and more access to
[08:54.600 --> 09:02.140]  data. So with the registered tier, there are some specific privacy controls that are in place there
[09:02.140 --> 09:06.920]  that would not be in place in the controlled tier. So we would need an additional trust with
[09:06.920 --> 09:13.140]  the users of that data. So here's the Researcher Hub website. You can go to researchallofus.org.
[09:13.140 --> 09:18.360]  I encourage everybody to do that. From there, you can see the data snapshots, the data browser,
[09:18.360 --> 09:23.880]  and the survey explorer. The data browser that's available publicly does not let you drill down
[09:23.880 --> 09:30.000]  to individual level data. To do that, you have to move into what's on the right here, the Researcher
[09:30.000 --> 09:35.760]  Workbench. So this is just how the Researcher Hub components fit together. On the left, you have
[09:35.760 --> 09:40.860]  the data browser, data snapshot, survey explorer. And then in the middle is this kind of data
[09:40.860 --> 09:45.380]  passport model, which is how we are making the data available to researchers. We don't require
[09:45.380 --> 09:52.000]  you to ask or tell us what specific project you're going to be working on. We just verify your
[09:52.000 --> 09:58.920]  identity and, you know, kind of build a level of trust with you as a researcher, and then give you
[09:58.920 --> 10:04.240]  access to the data. In the Researcher Workbench, it includes the data dictionary, concept sets,
[10:04.240 --> 10:10.080]  cohort builder, notebooks, and help desk. And the one exciting piece of news to share is that the
[10:10.080 --> 10:14.580]  All of Us Research Program Researcher Workbench is open for beta testing. So it's a very important
[10:14.580 --> 10:19.400]  moment for us. We want to build this platform together with the research community, but we
[10:19.400 --> 10:25.120]  absolutely need the input of researchers to make it more robust to learn how well the data tools
[10:25.120 --> 10:30.380]  and policies are working. So the current components in terms of data include physical
[10:30.380 --> 10:36.360]  measurements, survey data, some EHR data. Much more is to come here, including genomics and digital
[10:36.360 --> 10:41.600]  health technologies data. The tools that are included include a dataset builder, a cohort
[10:41.600 --> 10:47.620]  builder, Jupyter notebooks with R and Python supported, and the help resources are wide and
[10:47.620 --> 10:53.060]  varied, including FAQs, sample notebooks and workspaces, including code snippets, documentation,
[10:53.060 --> 10:59.060]  and help desk. So access and security, just a moment here. Again, All of Us employs a data
[10:59.060 --> 11:05.580]  passport model for access to this researcher tier to grant researchers broad permission to
[11:05.580 --> 11:10.740]  explore the data for a wide range of studies. Data testing is available to researchers who have
[11:10.740 --> 11:15.120]  an eRA Commons account. This is what we're using for identity verification. So you have to have an
[11:15.120 --> 11:20.600]  eRA Commons account. Most researchers do. And if not, you can look into the process for creating
[11:20.600 --> 11:25.460]  that. And then you also have to have a signed data use agreement with your institution.
[11:25.460 --> 11:30.700]  So the institution that houses the data resource center finds an institution with your institution,
[11:30.700 --> 11:34.900]  and then we verify your identity through the eRA Commons account, and then that's how you're able
[11:34.900 --> 11:40.260]  to get access. So again, to facilitate collaboration, the researcher workbench is
[11:40.260 --> 11:45.920]  hosted in a cloud-based system. Data downloads onto your local machine are not at all allowed.
[11:46.000 --> 11:50.360]  And during the beta testing phase, another really nice perk is that researchers will receive free
[11:50.360 --> 11:56.780]  credits to cover computing storage costs. So here's just a peak look at what the workspace
[11:56.780 --> 12:00.740]  looks like when you come to the researcher workbench. First, you create a workspace, and this
[12:00.740 --> 12:07.280]  is where everything kind of will be all collected. And then you can create cohorts, concept sets,
[12:07.280 --> 12:12.940]  cohort reviews, and notebooks to run your analyses. This is just a look at what the tools look like
[12:12.940 --> 12:20.320]  in terms of the cohorts, data sets, and concept sets. And then just one more summary about, you
[12:20.320 --> 12:26.480]  And I have to say, the support features are really extensive, so anyone who is not a coder
[12:26.480 --> 12:31.460]  coming to this should be able to even get up and running. So there's the user support hub,
[12:31.460 --> 12:36.060]  researcher workbench, featured workspaces, reusable code snippets,
[12:36.060 --> 12:40.760]  demonstration projects, and then a help desk form that can give you kind of one-on-one help.
[12:41.380 --> 12:46.040]  So just to summarize this section, we are providing a robust data ecosystem
[12:46.840 --> 12:51.840]  with survey data, physical measurements, structured EHR data, with plans to add additional
[12:51.840 --> 12:56.780]  surveys, digital health technologies data, unstructured EHR data, linkages to other data
[12:56.780 --> 13:04.080]  sets, additional assays and biospecimens, and genomics. So moving on to the topic of the day,
[13:04.080 --> 13:09.360]  digital health technologies in the All of Us Research Program. So what is the vision really
[13:09.360 --> 13:14.900]  for digital health technologies? Well, digital health technologies data from things like fitness
[13:14.900 --> 13:22.460]  trackers, smartwatches, apps, other types of devices, can capture health-relevant data from
[13:22.460 --> 13:28.220]  individuals outside the hospital or clinic, which can really supplement current medical information.
[13:28.520 --> 13:34.960]  Another feature, especially for fitness trackers and smartwatches, is that the data are longitudinal
[13:34.960 --> 13:39.860]  in nature, and it can really give researchers an unprecedented ability to better understand
[13:39.860 --> 13:46.060]  the impact of lifestyle, environment, and other factors on health outcomes, and eventually and
[13:46.060 --> 13:51.500]  ultimately develop strategies for keeping people healthy in a very precise and individualized way.
[13:51.620 --> 13:56.080]  The program is currently collecting data from participants in a bring-your-own-device model,
[13:56.080 --> 13:59.560]  so we are not distributing devices to participants, although that is something
[13:59.560 --> 14:04.420]  that we are planning for. And we currently have connections to Fitbit and Apple HealthKit.
[14:04.420 --> 14:09.960]  So again, just bring your own device. We are also considering developing pilots of
[14:09.960 --> 14:16.460]  smartphone-based apps in the future. So what is the vision or impact specifically for the Fitbit
[14:16.460 --> 14:22.080]  bring-your-own-device data in the Researcher Workbench? So I believe that all of us
[14:22.080 --> 14:27.220]  really truly have a unique opportunity to release one of the largest and most diverse digital health
[14:27.220 --> 14:33.920]  technology data sets. We currently have data from over 9,000 participants, some beginning in 2009.
[14:33.920 --> 14:39.420]  In addition to current survey, EHR, and physical measurements, future data releases will also
[14:39.420 --> 14:43.720]  include genomics and other data sets. So when you think about the power of combining all of
[14:43.720 --> 14:48.940]  these different data types together, it's truly amazing. And then this can promote research to
[14:48.940 --> 14:53.160]  develop new and improved disease prevention capabilities, remote patient monitoring for
[14:53.160 --> 14:58.940]  connected care, and other opportunities for high-cost diseases. So the approach that we
[14:58.940 --> 15:02.660]  are using for the Fitbit data in the All of Us Research Program
[15:03.440 --> 15:09.500]  is this Minimum Viable Product, or MVP, approach. So first step of this is to obtain the data,
[15:09.800 --> 15:16.540]  and we have all of the data currently through May 2020 that's going to all be curated and released
[15:16.540 --> 15:22.600]  in the next data release. We've also developed scientific research use cases around the Fitbit
[15:22.600 --> 15:27.340]  data just to make sure that the way that we're curating the data is going to make it useful to
[15:27.340 --> 15:32.640]  the research community. We are now assessing the data set for the longitudinal nature,
[15:32.640 --> 15:37.020]  missingness, and trends, or basically doing exploratory data analysis. And then based on
[15:37.020 --> 15:42.440]  two and three, based on scientific research use cases and the EDA of the data, defining the
[15:42.440 --> 15:46.680]  priority data elements for inclusion in the Researcher Workbench. And again, the timeline
[15:46.680 --> 15:54.900]  for that is to release it in the next data release, which is slated for late fall 2020,
[15:55.460 --> 16:00.460]  currently anyway. And then to develop and implement a Fitbit curation schema coordinated
[16:00.460 --> 16:05.920]  with the security privacy team and other approvals as needed. So this is what the currently what the
[16:05.920 --> 16:11.880]  Fitbit MVP looks like. You can see the data elements table over on the right. The data
[16:11.880 --> 16:16.080]  elements that we are currently pulling from Fitbit include heart rate, minute level heart rate,
[16:16.080 --> 16:21.620]  heart rate summary data, intraday steps, which is also the minute level, daily activity summary,
[16:22.180 --> 16:26.260]  daily weight, daily fat, food, and water. And it's important to note that daily weight,
[16:26.260 --> 16:33.240]  daily fat, food, and water are all user supplied elements. So just by nature of that, there are,
[16:33.240 --> 16:38.540]  there is much less of those data. The others are derived from parameters that are derived
[16:38.540 --> 16:45.800]  from the device. So what we've decided to do is to provide a series of data tables using BigQuery
[16:46.280 --> 16:51.740]  of parsed data tables, including heart rate, summary by zone, heart rate, minute level,
[16:51.740 --> 16:55.780]  activity, daily summary, and activity intraday steps in that level.
[16:56.440 --> 17:02.820]  So to dig into the data a little bit more, as well as some of the participant demographics,
[17:02.820 --> 17:09.200]  which are important for a number of reasons I'll go into. So these are characteristics of the
[17:09.200 --> 17:15.980]  current Fitbit data set in terms of the data elements. So we have 9,255 Fitbit bring your
[17:15.980 --> 17:24.500]  own device participants. And again, this is until March, March, May 2020 timeframe. 8,535 of these
[17:24.500 --> 17:32.560]  have primary and electronic health record consent with activity measures as early as 2012 and heart
[17:32.560 --> 17:38.780]  rate as early as 2015. So that is quite a lot of data, if you think about it. The data elements
[17:38.780 --> 17:43.600]  that we are pulling from Fitbit API are shown on the right with the device derived parameters in
[17:43.600 --> 17:50.280]  green, daily activity, intraday activity, heart rate, and sleep. And the user supplied are in
[17:50.280 --> 17:55.680]  blue, daily fat, daily weight, food, and water. So when you look at the data corpus as a whole,
[17:55.680 --> 18:00.700]  and you look just by data element to see which elements have the most data in them,
[18:00.700 --> 18:05.500]  intraday steps and activity summary are the data elements with the most data. Heart rate and sleep
[18:05.500 --> 18:11.640]  summary have about 40% less, and the user supplied daily weight, daily fat, food, and water intake
[18:11.640 --> 18:17.640]  have greater than 90% less data than the intraday steps. So in terms of what is going to be most
[18:17.640 --> 18:25.920]  relevant or useful, you know, for giving to the research community, definitely the, in my mind,
[18:25.920 --> 18:34.200]  the intraday steps activity summary, as well as heart rate. So then we wanted to try to assess
[18:34.200 --> 18:40.580]  the completeness of each participant's data. So percent completeness was assessed within each
[18:41.220 --> 18:49.080]  of the 8,535 participants. And the way that we did this was to do a theoretical maximum
[18:49.080 --> 18:55.680]  kind of comparison, where if you have to, that's based on the start and end time of donation. So
[18:55.680 --> 19:00.260]  if you have a user that comes to you, and they started donating data, and then they ended donating
[19:00.260 --> 19:05.500]  data, and it was about 100 days, then your theoretical maximum number of data elements
[19:05.500 --> 19:12.440]  for the daily elements would be 100. For the minute level, it would be 100 times
[19:12.440 --> 19:18.640]  the number of minutes in a day, which is 1,440. So then by comparing the number of data elements
[19:18.640 --> 19:23.640]  that you have for each of those data types within a person, by the theoretical maximum, you can get
[19:23.640 --> 19:29.320]  this percent completeness. And we were looking at two thresholds, greater than 60% complete,
[19:29.320 --> 19:35.800]  and greater than 80% complete. So because intraday steps, daily activity, and heart rate are all,
[19:36.360 --> 19:42.860]  they're all device-derived data types, you would assume that they're all relatively similar,
[19:42.860 --> 19:48.800]  right? Because if the person is wearing the device versus not, you wouldn't get steps and
[19:48.800 --> 19:52.460]  not heart rate, right? Unless the device just didn't have that capability. So it's actually
[19:52.460 --> 19:57.160]  kind of a good control to see that these numbers are very similar. So for intraday steps, daily
[19:57.160 --> 20:04.020]  activity, and heart rate, we have about 77 to 79% of the participants that are donating data have
[20:04.020 --> 20:09.580]  more than 60% of the data compared to the theoretical maximum amount, which is pretty good.
[20:09.580 --> 20:14.500]  This is a pretty complete set, which is what that means. When you look at 80%, it's slightly lower,
[20:14.500 --> 20:21.940]  but it's still very good. Around 62 to 65% of the participants have more than 80% of the
[20:21.940 --> 20:27.960]  maximum possible data available. So an interesting thing to note, though,
[20:27.960 --> 20:35.700]  is that when we looked at how complete a participant who is only donating data for
[20:35.700 --> 20:40.200]  one year or less versus a participant that's donating several years of data,
[20:40.200 --> 20:45.140]  we noticed that participants with data available for only one year or less had the lowest percentage
[20:45.140 --> 20:51.840]  of completeness, and completeness of participant data increased as duration of BYOD data donation
[20:51.840 --> 20:56.560]  increases. So this just goes to show that, I think, something that we've all noticed that,
[20:56.560 --> 21:00.160]  you know, if you've had a device for more than one year, if you wear it regularly,
[21:00.160 --> 21:03.700]  you know, it kind of becomes integrated into your life, and so the chance of having a more
[21:03.700 --> 21:09.580]  complete data set goes up. So before I move into the next section where I talk a little bit about
[21:09.580 --> 21:20.180]  the demographics of the participants in the Fitbit BYOD program versus all of us participants,
[21:20.180 --> 21:25.760]  I want to first just give you a summary of how we think about diversity in the All of Us
[21:25.760 --> 21:30.180]  research program. Diversity is critical to doing precision medicine research, because when you
[21:30.180 --> 21:35.420]  think about, you know, if we want to be able to deliver healthcare that's specifically for,
[21:35.420 --> 21:40.980]  you know, a specific type of person, we need to have research on all of those various types of
[21:40.980 --> 21:46.820]  people to be able to find those differences. So this is how we think about these groups that
[21:46.820 --> 21:51.900]  we call underrepresented in biomedical research and their definitions. So race and ethnicity,
[21:52.960 --> 22:00.000]  participants are identified as Hispanic, Latino, Spanish, race, or race identified other than white.
[22:00.000 --> 22:05.860]  Age, participant is under 18. Again, we're not doing that in this program yet. So over 65 is
[22:05.860 --> 22:10.800]  really what we focus on. Sexual and gender minorities, participant sex reported is different
[22:10.800 --> 22:15.880]  from that assigned at birth. Or orientation, sexual orientation is something other than straight.
[22:15.880 --> 22:21.560]  Income, the participant lives below the federal poverty level, which is less than 25,000 a year.
[22:21.580 --> 22:26.380]  Education, the participant has less than a high school degree. Geography, you live in a rural zip
[22:26.380 --> 22:31.440]  code. And then access to care, you can't readily access the healthcare system or can't pay for care.
[22:31.440 --> 22:35.780]  And then disability, some type of physical and or mental disability.
[22:36.900 --> 22:45.660]  So just first to start, I wanted to show you that the breakdown of age group in the FYPIT,
[22:45.660 --> 22:51.140]  versus all of us core participants. So the FYPIT BYOD are in the dark bars,
[22:51.140 --> 22:57.340]  and all of us are in the lighter bars. So what you can see here is that the distribution of age
[22:57.340 --> 23:03.980]  is fairly similar. You do lose a little bit of the 75 to 85, and this does fall into that EBR
[23:03.980 --> 23:12.620]  category. You do gain a little bit in the mid-20s to mid-30s. But overall, this is pretty comparable.
[23:13.120 --> 23:17.720]  So in the next chart, in this chart, I tried to highlight some of where we're seeing some
[23:17.720 --> 23:24.880]  of the gaps. And we're not able to capture EBRs in the same percentages with FYPIT participants
[23:24.880 --> 23:30.500]  versus all of us participants. So you can see that the vast majority of FYPIT participants,
[23:30.500 --> 23:39.280]  over 78% are white, whereas with all of us, we only have about 47, 48% are white.
[23:39.280 --> 23:48.120]  And then you have 20% Black or African American, 16% Hispanic or Latino, whereas in FYPIT BYOD,
[23:48.120 --> 23:56.980]  you only have 4 to 5% of those two EBR categories. Another feature of the FYPIT participants is that
[23:56.980 --> 24:04.780]  over 36% have an advanced degree. In all of us, only 18% have an advanced degree.
[24:04.780 --> 24:10.920]  In terms of high school only, we've got about 21% in all of us who have high school only degrees.
[24:11.060 --> 24:16.580]  And in FYPIT participants, it's only 6%. In terms of living below the poverty line,
[24:16.580 --> 24:21.900]  not surprisingly, we only have 8% of people donating data from that group,
[24:21.900 --> 24:30.820]  whereas all of us participants, we have 31%. So I wanted to really drive this point home that
[24:30.820 --> 24:35.240]  even though we're trying so hard, this program is so focused on diversity, we're trying really hard
[24:35.240 --> 24:40.500]  to capture the diversity, we can't do it in this kind of bring your own device model because this
[24:40.500 --> 24:44.820]  is probably reflecting people that have the devices, right? So we're going to have to be
[24:45.000 --> 24:49.640]  a little bit creative in order to try to capture the data from other groups. So just to summarize,
[24:49.640 --> 24:54.920]  there are specific diversity gaps in the FYPIT BYOD participants when compared with all of us
[24:54.920 --> 24:59.820]  core participants. And skipping down to the bottom and the two bulleted points that are in bold,
[24:59.820 --> 25:04.660]  there's a definite bias towards white versus other race categories, and that FYPIT participants
[25:04.660 --> 25:10.520]  are skewed to higher education and income levels. So why is diversity important? Just one more piece
[25:10.520 --> 25:18.300]  of evidence. This is a very popular paper in Nature from 2016, which surveyed the data that
[25:18.300 --> 25:26.440]  has been used to do GWAS or genome-wide association studies from 2009 to 2016. And what it shows is
[25:26.440 --> 25:34.320]  that in 2009, 96% of GWAS studies were done using data from or using samples from people with
[25:34.320 --> 25:40.360]  European ancestry. That is a tremendous bias, right? I mean, this is something to be really,
[25:40.360 --> 25:47.360]  really careful about. Since then, since this paper was published in 2016, it went down to 81%.
[25:48.120 --> 25:54.980]  And it looks like, you know, in particular, people with Asian ancestry are becoming more represented,
[25:54.980 --> 26:00.620]  but we're doing a terrible job at getting diversity in this data set. And my worry,
[26:00.620 --> 26:06.460]  my concern is that if we're not careful in how we do research, you know, using these digital health
[26:06.460 --> 26:11.020]  technologies data, we're going to be in the same situation. And that's certainly not something
[26:11.020 --> 26:17.280]  that we want, because these types of gaps are what lead to disparities in healthcare.
[26:17.900 --> 26:23.500]  So what is the opportunity and some considerations for DHT in the All of Us Research Program? So
[26:23.500 --> 26:28.380]  the All of Us Research Program truly has a unique opportunity to address some of the current
[26:28.380 --> 26:34.060]  challenges in DHT research due to the following core values of the program. First, the size and
[26:34.060 --> 26:39.620]  diversity of the cohort. Again, we're trying to get to a million. We've currently got more than
[26:39.620 --> 26:50.680]  350,000. And the focus on diversity means that we really do have a need to focus on diversity,
[26:50.680 --> 26:54.780]  not just at the level of the participants, but also at the level of the data type.
[26:55.840 --> 27:00.580]  It's also important to the program to capture data from those new VR groups,
[27:00.580 --> 27:05.280]  and the goal to develop an open access, rich, and complete biomedical data set
[27:05.280 --> 27:10.360]  that's not biased, and make it available to the research community. So that's really key.
[27:10.920 --> 27:15.940]  So because diversity of participants and of data is of critical importance to the All of Us
[27:15.940 --> 27:22.780]  Program, this information that we've learned from the PYOD analysis will be used to plan future
[27:22.780 --> 27:29.140]  digital health technology efforts, potentially piloting, giving out devices to people,
[27:29.800 --> 27:34.680]  to try to mitigate gaps in the traditionally underrepresented in biomedical research
[27:34.680 --> 27:41.720]  participants. So really the question that I want to get to, and that I want people to think about,
[27:41.720 --> 27:46.920]  and to hopefully spark some discussion, is how can the All of Us Program leverage DHT
[27:46.920 --> 27:51.460]  to provide unique opportunities for our participants and for the research community?
[27:52.040 --> 27:57.460]  So just some other key considerations for All of Us and digital health technologies. Again,
[27:57.460 --> 28:01.940]  diversity is key. I can't say it enough. Even in the All of Us Research Program, where there's a
[28:01.940 --> 28:07.240]  huge focus on diversity, it's a challenge to get that diversity in the digital health technology
[28:07.240 --> 28:14.300]  data. But it's critical and key for eliminating bias from results. You know, the All of Us
[28:14.300 --> 28:18.680]  Research Program, because we have the opportunity to release one of the richest data sets and
[28:18.680 --> 28:23.980]  largest data sets with the Fitbit release, it's an opportunity to democratize access to this data
[28:23.980 --> 28:28.840]  type. You know, there are a lot of research studies that have just a very few number of participants,
[28:28.840 --> 28:34.260]  or for the larger studies with lots more data, they're mostly in the private sector.
[28:34.260 --> 28:40.860]  So how can we, you know, in this safely enable a larger number of researchers to
[28:40.860 --> 28:44.940]  get familiar with these data types and start to do innovative things with them?
[28:45.660 --> 28:50.260]  The other thing is that, you know, a lot in a lot of cases, these are novel data types.
[28:50.260 --> 28:54.740]  So a lot of what has been built for privacy and security is unknown until proven, right?
[28:54.740 --> 29:01.680]  And so certainly data egress is something that the program guards against and is critical.
[29:01.680 --> 29:06.480]  And then with privacy, re-identification of participants is also another consideration
[29:07.100 --> 29:14.980]  that we need to make sure that we are protecting for. So I'm just going to show you slides to let
[29:14.980 --> 29:19.300]  you know that it really does take all of us. These are all of the All of Us consortium members
[29:19.300 --> 29:24.720]  beyond the community partners. You may or may not be able to read all of these
[29:24.720 --> 29:30.380]  organizations. It's truly massive. These are the community partner and provider networks.
[29:31.400 --> 29:37.980]  And here's a picture of All of Us in there somewhere. And it really is truly a monumental
[29:37.980 --> 29:43.840]  effort. So please feel free to find out more information, shoot me a note, ask a question,
[29:43.840 --> 29:46.140]  discuss, and thank you so much for your time.
